Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HBASE-25357] allow specifying binary row key range to pre-split regions #72

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Dieken
Copy link

@Dieken Dieken commented Nov 1, 2020

For example, the row key may start with a long integer, we can specify
ranges to pre-split regions:

import java.nio.charset.StandardCharsets;
import org.apache.hadoop.hbase.util.Bytes;

df.write()
  .format("org.apache.hadoop.hbase.spark")
  .option(HBaseTableCatalog.tableCatalog(), catalog)
  .option(HBaseTableCatalog.newTable(), 5)
  .option(HBaseTableCatalog.regionStart(), new String(Bytes.toBytes(0L), StandardCharsets.ISO_8859_1))
  .option(HBaseTableCatalog.regionEnd(), new String(Bytes.toBytes(2000000L), StandardCharsets.ISO_8859_1))
  .mode(SaveMode.Append)
  .save();

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 4s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-0 ⚠️ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 1m 23s master passed
+1 💚 compile 0m 37s master passed
+1 💚 scaladoc 0m 17s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 44s the patch passed
+1 💚 compile 0m 35s the patch passed
+1 💚 scalac 0m 35s the patch passed
-1 ❌ whitespace 0m 0s The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 scaladoc 0m 16s the patch passed
_ Other Tests _
+1 💚 unit 3m 55s hbase-spark in the patch passed.
9m 0s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #72
Optional Tests dupname scalac scaladoc unit compile
uname Linux 641d10548c92 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / b9706c8
whitespace https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/artifact/yetus-precommit-check/output/whitespace-eol.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/testReport/
Max. process+thread count 915 (vs. ulimit of 12500)
modules C: spark/hbase-spark U: spark/hbase-spark
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Dieken Dieken force-pushed the specify-binary-row-key-range branch from 3244622 to 96bfc39 Compare November 3, 2020 03:51
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 4m 59s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-0 ⚠️ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 12m 25s master passed
+1 💚 compile 2m 18s master passed
+1 💚 scaladoc 0m 23s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 6s the patch passed
+1 💚 compile 1m 49s the patch passed
+1 💚 scalac 1m 49s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 scaladoc 0m 21s the patch passed
_ Other Tests _
+1 💚 unit 24m 8s hbase-spark in the patch passed.
48m 44s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/2/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #72
Optional Tests dupname scalac scaladoc unit compile
uname Linux 5951f3910863 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / b9706c8
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/2/testReport/
Max. process+thread count 826 (vs. ulimit of 12500)
modules C: spark/hbase-spark U: spark/hbase-spark
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/2/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@meszibalu
Copy link
Contributor

@Dieken please create a Jira for this change if you want to get it merged. Thank you!

@Dieken Dieken changed the title allow specifying binary row key range to pre-split regions [HBASE-25357] allow specifying binary row key range to pre-split regions Dec 4, 2020
For example, the row key may start with a long integer, we can specify
ranges to pre-split regions:

```
import java.nio.charset.StandardCharsets;
import org.apache.hadoop.hbase.util.Bytes;

df.write()
  .format("org.apache.hadoop.hbase.spark")
  .option(HBaseTableCatalog.tableCatalog(), catalog)
  .option(HBaseTableCatalog.newTable(), 5)
  .option(HBaseTableCatalog.regionStart(), new String(Bytes.toBytes(0L), StandardCharsets.ISO_8859_1))
  .option(HBaseTableCatalog.regionEnd(), new String(Bytes.toBytes(2000000L), StandardCharsets.ISO_8859_1))
  .mode(SaveMode.Append)
  .save();
```
@Dieken Dieken force-pushed the specify-binary-row-key-range branch from 96bfc39 to 41f2156 Compare December 4, 2020 04:17
@Dieken
Copy link
Author

Dieken commented Dec 4, 2020

@Dieken please create a Jira for this change if you want to get it merged. Thank you!

Created https://issues.apache.org/jira/browse/HBASE-25357

@meszibalu

parameters.get(HBaseTableCatalog.regionEnd)
.getOrElse(HBaseTableCatalog.defaultRegionEnd))
val startKey = parameters.get(HBaseTableCatalog.regionStart)
.getOrElse(HBaseTableCatalog.defaultRegionStart).getBytes(StandardCharsets.ISO_8859_1)
Copy link

@wchevreuil wchevreuil Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it is a good idea to use different encoding from the default used by Bytes util converter (StandardCharsets.UTF_8), as many pieces of hbase code would rely on the Bytes converter, comparisons may become inconsistent.

Also, why you are using a different converter here, can you elaborate better what is the issue you are having within the builtin Bytes converter?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spark option use string to pass parameters, not support directly passing bytes,I need pass binary row key so I have to interpreter binary bytes as ISO_8859_1 encoded String, it’s not valid UTF-8.

It’s a trick, does break backward compatibility for UTF-8 string containing characters beyond ISO_8859_1 charset, the UTF-8 string must be wrapped as explained in the JIRA issue.

I can’t figure out better way to pass bytes in spark option.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 1s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-0 ⚠️ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 1m 27s master passed
+1 💚 compile 0m 37s master passed
+1 💚 scaladoc 0m 46s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 45s the patch passed
+1 💚 compile 0m 39s the patch passed
+1 💚 scalac 0m 39s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 scaladoc 0m 46s the patch passed
_ Other Tests _
+1 💚 unit 7m 3s hbase-spark in the patch passed.
13m 48s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #72
Optional Tests dupname scalac scaladoc unit compile
uname Linux b9487a03e2cc 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / fddb433
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/testReport/
Max. process+thread count 918 (vs. ulimit of 12500)
modules C: spark/hbase-spark U: spark/hbase-spark
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 43s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-0 ⚠️ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 1m 55s master passed
+1 💚 compile 0m 49s master passed
+1 💚 scaladoc 0m 54s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 55s the patch passed
+1 💚 compile 0m 48s the patch passed
+1 💚 scalac 0m 48s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 scaladoc 0m 57s the patch passed
_ Other Tests _
+1 💚 unit 7m 19s hbase-spark in the patch passed.
16m 14s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #72
Optional Tests dupname scalac scaladoc unit compile
uname Linux 4cf38c84d016 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / fddb433
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/testReport/
Max. process+thread count 947 (vs. ulimit of 12500)
modules C: spark/hbase-spark U: spark/hbase-spark
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 0s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-0 ⚠️ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 1m 39s master passed
+1 💚 compile 0m 37s master passed
+1 💚 scaladoc 0m 45s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 47s the patch passed
+1 💚 compile 0m 38s the patch passed
+1 💚 scalac 0m 38s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 scaladoc 0m 48s the patch passed
_ Other Tests _
+1 💚 unit 7m 6s hbase-spark in the patch passed.
14m 1s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #72
Optional Tests dupname scalac scaladoc unit compile
uname Linux b17eab94ab3b 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / 37aa8d5
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/testReport/
Max. process+thread count 916 (vs. ulimit of 12500)
modules C: spark/hbase-spark U: spark/hbase-spark
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-0 ⚠️ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 3m 49s master passed
+1 💚 compile 0m 36s master passed
+1 💚 scaladoc 0m 45s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 44s the patch passed
+1 💚 compile 0m 39s the patch passed
+1 💚 scalac 0m 39s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 scaladoc 0m 46s the patch passed
_ Other Tests _
+1 💚 unit 7m 24s hbase-spark in the patch passed.
17m 5s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/artifact/yetus-precommit-check/output/Dockerfile
GITHUB PR #72
Optional Tests dupname scalac scaladoc unit compile
uname Linux 6e1d66f37f33 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 GNU/Linux
Build tool hb_maven
Personality dev-support/jenkins/hbase-personality.sh
git revision master / 2bfc5f1
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/testReport/
Max. process+thread count 917 (vs. ulimit of 12500)
modules C: spark/hbase-spark U: spark/hbase-spark
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-Connectors-PreCommit/job/PR-72/1/console
versions git=2.20.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants